Skip to content

More aggressively detect and disconnect ill-behaving or broken clients#5017

Open
gefjon wants to merge 3 commits into
masterfrom
phoebe/incorrect-args-handling
Open

More aggressively detect and disconnect ill-behaving or broken clients#5017
gefjon wants to merge 3 commits into
masterfrom
phoebe/incorrect-args-handling

Conversation

@gefjon
Copy link
Copy Markdown
Contributor

@gefjon gefjon commented May 13, 2026

Description of Changes

Contains two separate but related fixes:

Reject reducer and procedure args which parse but leave behind trailing bytes

Fixes #4945 .

BSATN parsing generally accepts the case where there are unused trailing bytes in a buffer after parsing a type. This allows both building up compound-typed objects by repeatedly parsing their members, and packing multiple values into the same buffer sequentially.

However, it has an unfortunate consequence when parsing untrusted inputs: if a client submits an input e.g. for a reducer call which is not of the expected type, but has a prefix that parses at the expected type, a direct use of the BSATN parser will accept it and silently ignore the suffix. One simple example is a client attempting to submit an i64 when SpacetimeDB expects an i32, resulting in the high 4 bytes of the client submission being ignored, potentially resulting in a different number being parsed than the one submitted.

In this commit, we check when parsing user-submitted function arguments that not only did the parse succeed, but that it also consumed the entire input. If the entire input was not consumed, we treat it as an error in the class described above.

Disconnect clients which issue invalid function calls

Fixes #5002 , #4943 . Prior to this change, clients who sent invalid or ill-typed function calls would receive error responses, but would not be disconnected. Clients which didn't handle those errors would continue sending invalid function calls continuously, consuming server time without performing useful work.

With this change, errors encountered when calling reducers or procedures are divided into three categories:

  1. Those that send an error response and do not terminate the client, as previously.
  2. Those that close the connection with the close code Invalid, meaning the client is broken and should not attempt to reconnect. Invalid and ill-typed function call arguments, as described above, are in this category.
  3. Those that close the connection with the close code Again, meaning the client should attempt to reconnect. These are things that (may) indicate that a leader failover has occurred and the leader replica is now housed on a different SpacetimeDB node.

Notably, this PR does not include any changes to any of the client SDKs. We may recognize the Again close code and attempt to reconnect in the future, but do not do so currently.

As part of this change, I introduced a new mechanism by which the core crate can disconnect a ClientConnectionSender, and changed the existing disconnect when the client exceeds its queue limit to use that new mechanism.

API and ABI breaking changes

I'm not sure "ABI break" is technically the correct label, but this is from a certain perspective a breaking change to the raw WebSocket API, in the sense that we now reject some requests that previously we would accept, and close some connections that we would previously have left open.

Expected complexity level and risk

For the user-facing changes

2: possible some user somewhere has been relying on this behavior and sending unused bytes in the arguments part of a CallReducer or CallProcedure message somehow. Most likely this would be a user of the raw WebSocket format, but it could also be someone with outdated generated bindings in a benign way.

For the internal change to ClientConnectionSender and the WebSocket loop

3: this code is tricky. In particular, there's some concern that this change breaks ping/ponging, disconnecting clients which fill their queues, or some other aspect of the connection lifecycle.

Testing

  • Added some unit tests.
  • Manually tested that clients which exceed their queue length are still disconnected in a timely fashion.
  • Manually build a client with incorrect bindings for a reducer's arguments in the way described above and encounter the new error and disconnection.

BSATN parsing generally accepts the case
where there are unused trailing bytes in a buffer after parsing a type.
This allows both building up compound-typed objects by repeatedly parsing their members,
and packing multiple values into the same buffer sequentially.

However, it has an unfortunate consequence when parsing untrusted inputs:
if a client submits an input e.g. for a reducer call which is not of the expected type,
but has a prefix that parses at the expected type,
a direct use of the BSATN parser will accept it and silently ignore the suffix.
One simple example is a client attempting to submit an i64 when SpacetimeDB expects an i32,
resulting in the high 4 bytes of the client submission being ignored,
potentially resulting in a different number being parsed than the one submitted.

In this commit, we check when parsing user-submitted function arguments
that not only did the parse succeed, but that it also consumed the entire input.
If the entire input was not consumed, we treat it as an error in the class described above.
@gefjon gefjon added the abi-break A PR that makes an ABI breaking change label May 13, 2026
gefjon added 2 commits May 15, 2026 12:45
When a client sends an invalid `CallReducer` or `CallProcedure` message,
previously, we'd send them an error response but continue their connection.
That was silly; there's significant classes of error which mean the connection is broken
and should be killed.
With this change, a client-supplied invalid `CallReducer` will result in a disconnect.
A client-supplied invalid `CallProcedure` will panic due to hitting a `todo!()`
which will be filled in in a subsequent commit.

As part of this change, I defined `CloseFrame` in spacetimedb-core
as a mirror to tunstenite's CloseFrame.
This led to a minor audit of our close codes, and revealed one incorrect use:
`CloseCode::Error` is defined by the spec as "internal error",
but we were sending it in response to a client error.
I have replaced it with `CloseCode::Protocol`, which is "protocol error".
This commit adds and implements `ClientConnectionSender::disconnect`,
which does what you expect.
This required adding a new member to `ClientConnectionSender`, the `disconnect_tx`,
along which disconnect messages are sent.
Adding the `disconnect_tx` also made it convenient to tidy the `UnorderedWsMessage` channel
and rename that concept to `WsControlMessage`.
@gefjon gefjon changed the title Reject BSATN function args where the buffer is too long More aggressively detect and disconnect ill-behaving or broken clients May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

abi-break A PR that makes an ABI breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disconnect clients who send ill-typed or unparseable function args server side client message args validation is wrong sometimes

1 participant